AITopics | low-level agent

Collaborating Authors

low-level agent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ReMA: Learning to Meta-think for LLMs with Multi-agent Reinforcement Learning

Neural Information Processing SystemsJun-22-2026, 07:17:08 GMT

arxiv preprint arxiv, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country: Asia (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sample Complexity of Goal-Conditioned Hierarchical Reinforcement Learning

Neural Information Processing SystemsFeb-17-2026, 00:16:15 GMT

HRL lead to improved sample complexity, and how much of an improvement can it provide? Theoretical work on sample-complexity bound in Machine Learning has been integral to the development of the field.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Germany > Bavaria > Upper Franconia > Bayreuth (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Sample Complexity of Goal-Conditioned Hierarchical Reinforcement Learning

Neural Information Processing SystemsOct-9-2025, 06:55:55 GMT

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Germany > Bavaria > Upper Franconia > Bayreuth (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

A Hierarchical Deep Reinforcement Learning Framework for Traffic Signal Control with Predictable Cycle Planning

Gu, Hankang, Zhang, Yuli, Wang, Chengming, Jiang, Ruiyuan, Qiao, Ziheng, Fan, Pengfei, Jia, Dongyao

arXiv.org Artificial IntelligenceSep-4-2025

Deep reinforcement learning (DRL) has become a popular approach in traffic signal control (TSC) due to its ability to learn adaptive policies from complex traffic environments. Within DRL-based TSC methods, two primary control paradigms are ``choose phase" and ``switch" strategies. Although the agent in the choose phase paradigm selects the next active phase adaptively, this paradigm may result in unexpected phase sequences for drivers, disrupting their anticipation and potentially compromising safety at intersections. Meanwhile, the switch paradigm allows the agent to decide whether to switch to the next predefined phase or extend the current phase. While this structure maintains a more predictable order, it can lead to unfair and inefficient phase allocations, as certain movements may be extended disproportionately while others are neglected. In this paper, we propose a DRL model, named Deep Hierarchical Cycle Planner (DHCP), to allocate the traffic signal cycle duration hierarchically. A high-level agent first determines the split of the total cycle time between the North-South (NS) and East-West (EW) directions based on the overall traffic state. Then, a low-level agent further divides the allocated duration within each major direction between straight and left-turn movements, enabling more flexible durations for the two movements. We test our model on both real and synthetic road networks, along with multiple sets of real and synthetic traffic flows. Empirical results show our model achieves the best performance over all datasets against baselines.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2509.03118

Country: Asia > China (0.16)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Power Grid Control with Graph-Based Distributed Reinforcement Learning

Fabrizio, Carlo, Losapio, Gianvito, Mussi, Marco, Metelli, Alberto Maria, Restelli, Marcello

arXiv.org Artificial IntelligenceSep-4-2025

The necessary integration of renewable energy sources, combined with the expanding scale of power networks, presents significant challenges in controlling modern power grids. Traditional control systems, which are human and optimization-based, struggle to adapt and to scale in such an evolving context, motivating the exploration of more dynamic and distributed control strategies. This work advances a graph-based distributed reinforcement learning framework for real-time, scalable grid management. The proposed architecture consists of a network of distributed low-level agents acting on individual power lines and coordinated by a high-level manager agent. A Graph Neural Network (GNN) is employed to encode the network's topological information within the single low-level agent's observation. To accelerate convergence and enhance learning stability, the framework integrates imitation learning and potential-based reward shaping. In contrast to conventional decentralized approaches that decompose only the action space while relying on global observations, this method also decomposes the observation space. Each low-level agent acts based on a structured and informative local view of the environment constructed through the GNN. Experiments on the Grid2Op simulation environment show the effectiveness of the approach, which consistently outperforms the standard baseline commonly adopted in the field. Additionally, the proposed model proves to be much more computationally efficient than the simulation-based Expert method.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2509.02861

Country: Europe (0.68)

Genre:

Overview (0.68)
Research Report (0.64)

Industry: Energy > Power Industry (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches

Tan, Jiejun, Dou, Zhicheng, Yu, Yan, Cheng, Jiehan, Ju, Qiang, Xie, Jian, Wen, Ji-Rong

arXiv.org Artificial IntelligenceAug-12-2025

Recently, large reasoning models have demonstrated strong mathematical and coding abilities, and deep search leverages their reasoning capabilities in challenging information retrieval tasks. Existing deep search works are generally limited to a single knowledge source, either local or the Web. However, enterprises often require private deep search systems that can leverage search tools over both local and the Web corpus. Simply training an agent equipped with multiple search tools using flat reinforcement learning (RL) is a straightforward idea, but it has problems such as low training data efficiency and poor mastery of complex tools. To address the above issue, we propose a hierarchical agentic deep search framework, HierSearch, trained with hierarchical RL. At the low level, a local deep search agent and a Web deep search agent are trained to retrieve evidence from their corresponding domains. At the high level, a planner agent coordinates low-level agents and provides the final answer. Moreover, to prevent direct answer copying and error propagation, we design a knowledge refiner that filters out hallucinations and irrelevant evidence returned by low-level agents. Experiments show that HierSearch achieves better performance compared to flat RL, and outperforms various deep search and multi-source retrieval-augmented generation baselines in six benchmarks across general, finance, and medical domains.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.08088

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report (0.64)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Electric Bus Charging Schedules Relying on Real Data-Driven Targets Based on Hierarchical Deep Reinforcement Learning

Qi, Jiaju, Lei, Lei, Jonsson, Thorsteinn, Hanzo, Lajos

arXiv.org Artificial IntelligenceMay-16-2025

A Markov Decision Process (MDP) is conceived, where the time horizon includes multiple charging and operating periods in a day, while each period is further divided into multiple time steps. To overcome the challenge of long-range multi-phase planning with sparse reward, we conceive Hierarchical DRL (HDRL) for decoupling the original MDP into a high-level Semi-MDP (SMDP) and multiple low-level MDPs. The Hierarchical Double Deep Q-Network (HDDQN)-Hindsight Experience Replay (HER) algorithm is proposed for simultaneously solving the decision problems arising at different temporal resolutions. As a result, the high-level agent learns an effective policy for prescribing the charging targets for every charging period, while the low-level agent learns an optimal policy for setting the charging power of every time step within a single charging period, with the aim of minimizing the charging costs while meeting the charging target. It is proved that the flat policy constructed by superimposing the optimal high-level policy and the optimal low-level policy performs as well as the optimal policy of the original MDP . Since jointly learning both levels of policies is challenging due to the non-stationarity of the high-level agent and the sampling inefficiency of the low-level agent, we divide the joint learning process into two phases and exploit our new HER algorithm to manipulate the experience replay buffers for both levels of agents. Numerical experiments are performed with the aid of real-world data to evaluate the performance of the proposed algorithm. Index T erms Deep Reinforcement Learning; Electric Bus; Hierarchical Reinforcement learning; Charging Control A CRONYMS DDQN Double Deep Q Network DNN Deep Neural Network DQL Double Q-learning DQN Deep Q Network DRL Deep Reinforcement Learning EB Electric Bus EBCSP Electric Bus Charging Scheduling Problem GA Genetic Algorithm GHG GreenHouse Gas HAC Hierarchical Actor-Critic HDDQN Hierarchical Double Deep Q Network HDRL Hierarchical Deep Reinforcement Learning HER Hindsight Experience Replay HRL Hierarchical Reinforcement Learning IA Immune Algorithm MDP Markov Decision Process MILP Mixed Integer Linear Programming ML Machine Learning MRP Markov Reward Process RES Renewable Energy Source RL Reinforcement Learning RO Robust Optimization RTEM Real Time Energy Market 1 J. Qi and L. Lei are with the School of Engineering, University of Guelph, Guelph, ON N1G 2W1, Canada, jiaju@uoguelph.ca;

machine learning, reinforcement learning, time step, (19 more...)

arXiv.org Artificial Intelligence

2505.10262

Country:

North America > Canada > Ontario > Wellington County > Guelph (0.24)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (0.82)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning

Wan, Ziyu, Li, Yunxiang, Song, Yan, Wang, Hanjing, Yang, Linyi, Schmidt, Mark, Wang, Jun, Zhang, Weinan, Hu, Shuyue, Wen, Ying

arXiv.org Artificial IntelligenceMar-14-2025

Recent research on Reasoning of Large Language Models (LLMs) has sought to further enhance their performance by integrating meta-thinking -- enabling models to monitor, evaluate, and control their reasoning processes for more adaptive and effective problem-solving. However, current single-agent work lacks a specialized design for acquiring meta-thinking, resulting in low efficacy. To address this challenge, we introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors, encouraging LLMs to think about thinking. ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions. Through iterative reinforcement learning with aligned objectives, these agents explore and learn collaboration, leading to improved generalization and robustness. Experimental results demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks, including competitive-level mathematical benchmarks and LLM-as-a-Judge benchmarks. Comprehensive ablation studies further illustrate the evolving dynamics of each distinct agent, providing valuable insights into how the meta-thinking reasoning process enhances the reasoning capabilities of LLMs.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.09501

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > British Columbia (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Uncertainty-Aware Critic Augmentation for Hierarchical Multi-Agent EV Charging Control

Ting, Lo Pang-Yun, Şenol, Ali, Wang, Huan-Yang, Lai, Hsu-Chao, Chuang, Kun-Ta, Liu, Huan

arXiv.org Artificial IntelligenceDec-23-2024

The advanced bidirectional EV charging and discharging technology, aimed at supporting grid stability and emergency operations, has driven a growing interest in workplace applications. It not only effectively reduces electricity expenses but also enhances the resilience of handling practical issues, such as peak power limitation, fluctuating energy prices, and unpredictable EV departures. However, existing EV charging strategies have yet to fully consider these factors in a way that benefits both office buildings and EV users simultaneously. To address these issues, we propose HUCA, a novel real-time charging control for regulating energy demands for both the building and electric vehicles. HUCA employs hierarchical actor-critic networks to dynamically reduce electricity costs in buildings, accounting for the needs of EV charging in the dynamic pricing scenario. To tackle the uncertain EV departures, a new critic augmentation is introduced to account for departure uncertainties in evaluating the charging decisions, while maintaining the robustness of the charging control. Experiments on real-world electricity datasets under both simulated certain and uncertain departure scenarios demonstrate that HUCA outperforms baselines in terms of total electricity costs while maintaining competitive performance in fulfilling EV charging requirements. A case study also manifests that HUCA effectively balances energy supply between the building and EVs based on real-time information.

artificial intelligence, machine learning, real time system, (17 more...)

arXiv.org Artificial Intelligence

2412.18047

Genre: Research Report (0.50)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

Hierarchical Multi-Agent DRL Based Dynamic Cluster Reconfiguration for UAV Mobility Management

Meer, Irshad A., Besser, Karl-Ludwig, Ozger, Mustafa, Schupke, Dominic, Poor, H. Vincent, Cavdar, Cicek

arXiv.org Artificial IntelligenceDec-5-2024

Multi-connectivity involves dynamic cluster formation among distributed access points (APs) and coordinated resource allocation from these APs, highlighting the need for efficient mobility management strategies for users with multi-connectivity. In this paper, we propose a novel mobility management scheme for unmanned aerial vehicles (UAVs) that uses dynamic cluster reconfiguration with energy-efficient power allocation in a wireless interference network. Our objective encompasses meeting stringent reliability demands, minimizing joint power consumption, and reducing the frequency of cluster reconfiguration. To achieve these objectives, we propose a hierarchical multi-agent deep reinforcement learning (H-MADRL) framework, specifically tailored for dynamic clustering and power allocation. The edge cloud connected with a set of APs through low latency optical back-haul links hosts the high-level agent responsible for the optimal clustering policy, while low-level agents reside in the APs and are responsible for the power allocation policy. To further improve the learning efficiency, we propose a novel action-observation transition-driven learning algorithm that allows the low-level agents to use the action space from the high-level agent as part of the local observation space. This allows the lower-level agents to share partial information about the clustering policy and allocate the power more efficiently. The simulation results demonstrate that our proposed distributed algorithm achieves comparable performance to the centralized algorithm. Additionally, it offers better scalability, as the decision time for clustering and power allocation increases by only 10% when doubling the number of APs, compared to a 90% increase observed with the centralized approach.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2412.16167

Country: Europe > Denmark (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology (0.48)
Aerospace & Defense (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback